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Abstract. We develop general methods to obtain fast (polynomial time) estimates 
of the cardinality of a combinatorially defined set via solving some randomly gener- 
ated optimization problems on the set. Geometrically, we estimate the cardinality of 
a subset of the Boolean cube via the average distance from a point in the cube to the 
subset. As an application, we present a new randomized polynomial time algorithm 
which approximates the permanent of a 0-1 matrix by solving a small number of 
Assignment problems. 



A general problem of combinatorial counting can be stated as follows: given a 
family T C "2?^ of subsets of the ground set X, compute or estimate the cardinality 
\T\ of the family. We would like to do the computation efficiently, in 'polynomial 
time. Of course, one should clarify what "given" means, especially since in most 
interesting cases \T\ is exponentially large in the cardinality \X\ of the ground set. 
Following the earlier paper [Barvinok 97a], we assume that the family T is defined 
by its Optimization Oracle: 

(1.1) Optimization Oracle defining a family JF c 2^ 
Input: A set of integer weights : x G X. 



That is, for any given integer weighting {7a;} on the set X, we should be able 
to produce the minimum weight of a subset Y ^ T . As is discussed in [Barvinok 
97a], for many important families T the Optimization Oracle is readily available. 
The following example is central for this paper. 
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(1.2) Example: Perfect matchings in a graph. Let G = {V,E) be a graph 
with the set V of vertices and set E of edges. We assume that G has no loops 
(edges whose endpoints coincide) and no isolated vertices. A set M G E of edges 
is called a matching in G if every vertex of G is incident to at most one edge from 
M. A matching M is called perfect if every vertex of G is incident to precisely one 
edge from M. Let ^ C 2-^ be the set of all perfect matchings in G. The problem 
of computing or estimating |JF| efficiently is one of the hardest and most intriguing 
problems of combinatorial counting, see, for example, [Lovasz and Plummer 86], 
[Jerrum and Sinclair 89], [Jerrum 95] and [Jerrum and Sinclair 97]. 

We observe that Optimization Oracle 1.1 can be efficiently constructed. Indeed, 
if we assign integer weights e & E to the edges of the graph, the minimum 
weight of a perfect matching can be computed in 0(|V^|^) time, see, for example 
Section 11.3 of [Papadimitriou and Steiglitz 98]. 

A particularly interesting case is that of a bipartite graph G when the vertices of 
G are partitioned into two classes, V = V'^ U V~ such that every edge e E E has 
one endpoint in V'^ and the other in V~ . Then the number of perfect matchings in 
G is equal to the permanent of a 0-1 matrix associated with G (see also Section 5). 
In this case, the corresponding optimization problem is known as the Assignment 
Problem. It is not only "theoretically easy", but in practice large instances are 
routinely solved as the Assignment Problem is a particular case of the minimum 
cost network flow problem (see, for example. Section 11.2 of [Papadimitriou and 
Steiglitz 98]). 

Other interesting and generally difficult problems of combinatorial counting 
where the Optimization Oracle is provided by classical combinatorial optimiza- 
tion algorithms include counting bases in matroids, counting independent sets in 
matroids and counting bases in the intersection of two matroids over the same 
ground set, see [Jerrum and Sinclair 97] for a discussion of the counting problems 
and [Papadimitriou and Steiglitz 98] for a description of the underlying optimiza- 
tion algorithms. Particularly interesting special cases of those problems include 
counting spanning trees, counting forests and counting spanning subgraphs in a 
given graph and counting non-degenerate maximal minors in a given rectangular 
matrix over GF{2). Some of the problems, such as counting spanning trees, admit 
a simple and efficient solution, others, such as counting matchings of all sizes in a 
graph, are known to be hard to solve exactly but can be solved approximately and 
still others, such as counting bases in matroids, are solved only in special cases. 
The problem of counting perfect matchings in a given graph, arguably the most 
famous problem of them all, still resists all attempts to solve it in full generality 
(see also Section 5). 

The most general approach to combinatorial counting has been via Monte Carlo 
method. The key component of the method is the ability to sample a random point 
from the (almost) uniform distribution on JF. Often, to achieve this, a Markov chain 
on the set JF is generated, so that it converges rapidly to the uniform distribution on 
!F (see [Jerrum and Sinclair 97] for a survey). This approach resulted, for example, 
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in finding a polynomial time randomized algorithm to count matchings of all sizes 
in a given graph with a prescribed accuracy [Jerrum and Sinclair 89]. When the 
Markov chain approach works, it produces incomparably better results than the 
method of this paper. However, for many important counting problems, some of 
which are mentioned above, it is either not clear how to generate a rapidly mixing 
Markov chain or, when there is a "natural" candidate, it seems to be extremely 
hard to prove that the chain is indeed converging rapidly enough to the steady 
state (cf. [Jerrum and Sinclair 97]). In contrast, our approach produces very crude 
bounds, but it is totally insensitive to the fine structure of JF, so it is ready to 
handle a broad class of problems. In [Barvinok 97a], it was shown that the method 
allows one to decide whether the size \T\ is exponentially large in the size |-^| of the 
ground set in some precisely defined sense. In this paper, we improve the estimates 
of [Barvinok 97a] in several directions and apply them to new problems, notably to 
the problem of estimating the permanent of a given 0-1 matrix. 

The main idea of our approach is as follows. Given a family we identify it 
with a subset F of a metric space such that for any given point x & Q, the 

distance d{x,F) = mm d{x,y) can be quickly computed using Optimization Oracle 

yeF 

1.1 for J-'. Then we estimate the cardinality \F\ from the distance d{x,F) for a 
typical X G O. Intuitively, if \F\ is small, we expect the distance d{x,F) from a 
random point x E ^} to he large and vice versa. In this paper, Q is the Boolean 
cube {0, 1}" and d is either the Hamming distance or its modification, although 
as we discuss in Section 7, some other possibilities may be of interest. Thus our 
approach can be considered as a refinement of the classical Monte-Carlo method: 
we do not only register how often a randomly sampled point x G O lands in the 
target set F, but also take into account the distance d{x, F). This allows us to get 
non-trivial bounds even when \F\ is exponentially small with respect to \Q\ so that 
X typically misses F. 

The paper is organized as follows. 

In Section 2, we introduce a "geometric cousin" of Optimization Oracle 1.1. 
Distance Oracle 2.2 describes a subset F of the Boolean cube {0, 1}"^ by computing 
a suitably defined distance d from a given point in the cube to the set. We show 
how to construct embeddings </> : — ^ {0, 1}'\ so that the Distance Oracle for 
the image F = (j){J^) is derived from the Optimization Oracle for J^. We show that 
in some important cases (for example, when T is the set of perfect matchings in a 
graph), we can "squeeze" JF into a substantially smaller cube than we would have 
expected for a general family J^. 

In Section 3, we describe the bounds obtained by choosing d to be the Hamming 
distance in the cube. The bounds are sharp, meaning that we can't possibly esti- 
mate (in polynomial time) the cardinality of a subset F C {0, 1}"' better if the only 
information available is the Hamming distance from any given point a G {0, 1}" 
to the set F. Remarkably, the lower and the upper bound for a — n~^log2 \F\ 
converge when a or a ~ 1 and diverge the greatest when a = 1/2. 

In Section 4, we describe how to get better bounds for small sets by using a 
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suitably defined "randomized Hamming distance", which ignores a (random) part 
of the information contained in the standard Hamming distance. The isoperimetric 
problems arising here seem to be interesting in their own right. The proofs are not 
complicated but somewhat lengthy and therefore postponed till Section 6. 

In Section 5, we apply our methods to get a new polynomial time algorithm to 
approximate the permanent of a given 0-1 matrix. Geometrically, we represent the 
set of the perfect matchings of the underlying bipartite graph on n + n vertices 
as a subset F of the Boolean cube {0, 1}"^ with m = 0{nlnn) and estimate \F\ 
from the Hamming distance of a random point in the cube to F. We find the 
distance in question by averaging solutions of some randomly generated Assignment 
problems. We compare our method with other algorithms available in the literature. 
In particular, we show that our method allows us to recognize nxn matrices whose 
permanents are subexponential in n (Corollary 5.4). 

In Section 6, we supply proofs of the results of Section 4. 

In Section 7, we discuss possible ramifications of our approach and its relations 
with the Monte-Carlo method. 

2. Distance Oracle and Cubical Embeddings 

The idea of our method is to represent geometrically as a subset F of the 
Boolean cube and then derive estimates of using the average distance from a 
point in the cube to F. 

(2.1) Definitions. Let Cn = {0, 1}" be the Boolean cube and let dist be the 
Hamming distance in C„, that is 

dist(a, 6)= ^ 1 for a = (ai, . . . ,an), b = (Pi, . . . , Pn) & Cn- 

More generally, let us fix n functions : {0, 1} x {0, 1} — > Z, z = 1, . . . , n, which 
we interpret as penalties. We assume that di > and that d{0,0) = d{l, 1) = 0. 
Finally, let 

n 

d{a,b) = ^di{ai,Pi), where a = (ai, . . . ,an) and b = {Pi, . . . , Pn) 

be the distance function determined by the penalties {di}. 
If di^a, P) = I whenever a ^ P then d{a, h) — dist(a, b). 
For a subset B d Cn and a point a E let 

d{a, B) — min d(a, b) 

be the distance from a to B. In particular, let 

dist (a, B) = min dist (a, b) 

be the Hamming distance from a point a to the subset B. 

We will be working with the following "geometric cousin" of Optimization Oracle 
1.1. 
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(2.2) Distance Oracle defining a set F c 

Input: A point a e Cn and penalties di : {0, 1} x {0, 1} — > Z, z = 1, . . . , n. 
Output: The number d{a,F). 

There is an obvious way to associate with a family C 2-^ a subset F C C\x\ 
of the Boolean cube. 

(2.3) Straightforward embedding. Let us identify the ground set X with the 
set {1, . . . n = \X\. Let JF be a family of subsets of {1, ... , n} given by its 
Optimization Oracle. For a subset Y & let us define the indicator y E Cn, 
y= ivii--- ,Vn) by 

_ r 1 if ieY 
^'~\0 if i^Y. 

Let F = {y e Cn Y e J^} be the set of all indicators of subsets Y e J^. 

Let us construct the Distance Oracle for the set F C Cn- Given a point 
a = (q!i,... ,an) G Cn and penalties di, i = 1,... ,n, let us define weights 7^ 
by 7i = di{ai,l) — di{ai,0). Then for a set Y C {!,... ,n} and its indicator 
y = (^7i, • • • , ^7n) e Cn, we have 

n n 

= ^{di{ai, 1) - di{ai,0)) = ^di{ai,r]i) - ^di{ai,0) = d{a,y) - d(a, 0). 

ieY ieY i=l i=l 

Hence, given the output 

A = min 7^ 

ieY 

of Oracle 1.1 for the family J^, we can easily compute the output 

d{a,F) = X + d{a,0) 

of Oracle 2.2 for the set F. Thus, given an Optimization Oracle 1.1 for a family 
C 2^, we can efficiently construct a Distance Oracle 2.2 for a set F C C„, 
n = \X\, such that |F| = \ J^\. 

To be able to estimate the cardinality \J^\ with a better precision, we would like 
to embed into a smaller Boolean cube. Sometimes this is indeed possible. 

(2.4) Economical embedding. Suppose that the ground set X can be repre- 
sented as a union X = Xi U . . . U Xk of (not necessarily disjoint) parts Xj, so that 
|y n Xi| = 1 for every subset Y E J-" and every X^. In other words, every member 
of is a transversal of the cover of X by Xi, . . . , X/-. Let 



rrii — \i0g2 |Xj|] and m = 
5 



k 
i=l 



We construct an embedding T — ^ as follows. 

First, we index the elements of Xi by distinct binary strings of length m^, that 
is, we choose an embedding : Xi — > C'm,-. Thus for any x ^ Xi the point 0i(a;) 
is a binary string of length mj and ^ (pi{y) provided x ^ y. 

Let us identify 

Cm — X ... X CrUk • 

For a subset Y e J-', let us define y e Cm as 

2/ = (2/1, • • • ,2/fe), where yi ^ (piiY n Xi) e Crm- 

Note that y is well-defined, since every intersection YdXi consists of a single point. 
Let F = {y e Cm -.Y e T}. Clearly, \F\ = 

Given an Optimization Oracle 1.1 for J-', let us construct a Distance Oracle 2.2 
for F. The input of Oracle 2.2 consists of a point a G Cm (binary string of length 
m) and penalty functions {di : i = 1, . . .m}. We view a as 

a = {ai, . . . ,ak), where e C^.. 

The penalties (i^, i = 1, . . . , m give rise to the distance function d on binary 
strings, cf. Definition 2.1. For a point x e X, let us define its weight by 

(2.4.1) lx= Yl d{(^i,4>i{x))- 

i: xEXi 

Let y e be a set and let y e Cm be the point representing Y. We observe that 

k 

x€Y i: x€Xi i=l 

Hence, the outputs of Oracles 1.1 and 2.2 coincide: 

min > 7a; = minci(a,'u). 

Thus, given an Optimization Oracle 1.1 for a family .F C 2-^, we can efficiently 
construct a Distance Oracle 2.2 for a set F C Cm-, such that |F| = \J^\. More 
precisely, given a point a G Cm and penalties {di}, we compute weights {'jx} on X 
by (2.4.1) in 0(/c|X| In |X|) time and then apply Optimization Oracle 1.1 to find 
the minimum weight A of a subset y G in this weighting. The distance d(a, F) 
is equal to A. 

6 



(2.5) Example: Embedding perfect matchings. Let be the family of all 
perfect matchings in a graph G — {y^E\ see Example 1.2. The straightforward 
embedding (2.3) identifies T with a subset F of the Boolean cube {0, l}'^' and pro- 
vides us with Distance Oracle 2.2 for F . We will be better off using the economical 
embedding (2.4). Indeed, for a vertex v e F of G, let be the set of edges of 
G incident to v. Then E = Uy^yEy and every perfect matching has exactly one 
edge in every set Ey. Hence the embedding (2.4) identifies T with a subset F of 
the Boolean cube {0, 1}"^, where 

vev 

and provides us with Distance Oracle 2.2 for F. Given a point a G Cm, by (2.4.1) 
we compute weights 7e on the edges E in 0(|i?| In \E\) time (since every edge e E E 
belongs to exactly two sets Ey) and then find the minimum weight A of a perfect 
matching in G in 0(|yp) time. The distance d{a, F) from a to F is equal to A. 

Typically, if the graph has \V\ = n vertices and fl{n'^) edges, the dimension of the 
straightforward embedding will be 0(?i^), whereas the dimension of the economical 
embedding will be 0(?ilnn). We observe that for bipartite graphs we can reduce 
the dimension further by a factor of 2 at least by choosing 

m = min| flogal^^^H, l^^^^lEv]]], 

vev+ vev- 

since every perfect matching M G E will be a transversal of either partition E = 

Uy^v+Ey or E = Uy^v-Ey 

Another natural case of economical embedding 2.4 arises when is the set of 
common bases of two matroids on the same ground set, one of which is a transversal 
matroid. It would be interesting to find out if similar economical embeddings can 
be constructed for a broader class of families J-' C 2^, for example, when consists 
of "small" sets, that is, when |F| << \X\ for any Y e !F. 

3. Estimating Cardinality from the Hamming Distance 

In this section, we obtain estimates of the cardinality of a subset F C C„ if we 
choose di{0, 1) = di{l, 0) = 1, z = 1, . . . , n in Distance Oracle 2.2. In other words, 
we estimate provided we can compute the Hamming distance dist(a;,F) to F 
from any given point x G On, cf. Definitions 2.1. Our main tool is the average 
Hamming distance from a point to the set. 

(3.1) Definition. Let A C C„ be a subset of the Boolean cube. Let 

^(^) = ^ E dist(a;,^) 

x€Cn 

be the average Hamming distance from a point in the cube to the set A. 
Obviously, A{A) < A{B) ii B c A. 
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(3.2) Example: Set consisting of a single point. Suppose that the set A is 
a point. Without loss of generahty we assume that A = {(0, . . . ,0)}. Then, for 
X — {Cij ■ ■ ■ J Cn) we have dist(a:, A) = dist(a;, 0) = + . . . + ^„ and 

X^Cn X&Cn 

It foUows then that A (A) < n/2 for any non-empty A G Cn and that A{A) = n/2 
if and only if A consists of a single point. 

Our first objective is to present a probabilistic algorithm that computes A{A) 
approximately by averaging dist(x, A) for a number of randomly chosen x e Cn- 

(3.3) Algorithm for computing A(^) 

Input: A set A C Cn defined by its Distance Oracle 2.2 and a number e > 0. 
Output: A number a approximating A{A) within error e. 

Algorithm: Let k — f48n/e^]. Sample k points xi, . . . , Xk & Cn independently at 
random from the uniform distribution in the cube C„. Apply Distance Oracle 2.2 

1 

to find distfxj, A), i = 1, . . . ,k. Compute a = — distfxj, A). Output a. 

To prove that Algorithm 3.3 indeed approximates A (A) with the desired ac- 
curacy, we need a couple of technical results. The first lemma supplies us with 
important concentration inequalities for the Boolean cube. 

(3.4) Lemma. Let Cjv = {0, 1}^ be the Boolean cube and let f : Cjv — ^ R be a 

function such that 

\fix) - f{y)\ < dist(a;, y) for all x,y e Cn- 

Let 

E (/) = ^ E /(^) 

xECn 

be the average value of f . Let P denote the uniform probability measure on Cn, 
so P (A) = |A|/2^ for a set A G Cn- 
Then for any S > 

(-2 

p{xeCN: \f{x) - E (/)| >5}< 2exp{^}. 

Proof. See Sections 6.2 and 7.9 of [Milman and Schechtman 86]. □ 

The next lemma provides a useful "scaling" trick. 
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(3.5) Lemma. Let us fix positive integers k and n and let N = kn. Let us identify 
Cn = Cn X . . . X Cn = {CnY . Thus a point x e Cjv is identified with a k-tuple 
X = (xi, . . . , Xk), where Xi G for i ~ 1, . . . , k. 

For a subset A C Cn, let B ^ A x . . . x A ^ C Cn- Then 

k 

dist(a;,S) = ^dist(a;i,yl) for any x = (xi, . . . ,Xk) e Cn 

and 

A{B) = kA{A). 

Proof. Clearly, 

k 

dist{x,y) = ^^dist{xi,yi) for all x,y&C]sf, 
hence the first identity follows. Next, 

X£Cn Xl,... ,XkECn i=l 

dist(x, A) = ^Y. dist(a;, A) = kA{A). 



xeCn x€C„ 



Now we can prove correctness of Algorithm 3.3. 



□ 



(3.6) Theorem. With probability at least 0.9, the output a of Algorithm 3.3 sat- 
isfies the inequality |A(A) — a\ < e. 

Proof. Let N = nk and let us identify Cn = {Cn)^ as in Lemma 3.5. Let B = 
A'' C Cn- Let / : Cn — ^ R be defined by f{x) = dist(a;,i?). Applying Lemma 
3.4 with 5 = ke and observing that E (/) = A(5), we conclude that 

P [x : I dist(x, B) - A{B)\ > ke] < 2exp{-^} = 2exp{-^} < O.L 
Since by Lemma 3.5 

^ k ^ 

A(B) = kA(A) and - V dist(xi. A) ^ - dist(a;, B) 

k f k 

1=1 



for X = (xi, ... , a;/s), we conclude that 

1 

P |a;i,... ,Xk : | - ^ dist(a;i, ^) - A(A) > e} = 



i=l 



P 



|a; : | dist(a;, B) - A{B)\ > /cej < 0.1, 



and the proof follows. □ 

Remark. Hence to evaluate A(y4) within error e we have to average 0(?ie~^) values 
dist(a;j, A). By doing that, we allow probability 0.1 of failure. As usual, to attain 
a lower probability 5 > of failure, one should run Algorithm 3.3 0{ln5~^) times 
and then select the median of the computed a's (cf. [Jerrum et al. 86]). For all 
applications, choosing e = 1 will suffice and in many cases e = -/n will do (cf. 
Section 5 and [Barvinok 97a]). Hence, often we will have to apply Oracle 2.2 only 
a constant number of times. 

We would like to relate the value of A(^) to the cardinality |^|. 

(3.7) Definition. Entropy Function. For < x < 1/2 let 

H{x) = X log2 - + (1 - x) log2 - ^ 
X -L 



X 



We agree that H{0) = 0. Thus H is an increasing concave function on the interval 
[0,1/2]. 

We use the following estimate (see, for example. Theorem 1.4.5 of [van Lint 99]) 

(3.7.1) V (l) < 2"^(^/^) for r < n/2. 

fc=o 

Also, we remark that around x = +0 we have 

(3.7.2) H{x) = xlog2- + 0{x) and H - x) = 1 - -^x^ + 0{x^) 

We will use the classical isoperimetric inequality for the Boolean cube (see, for 
example, [Leader 91]). 

(3.8) Harper's Theorem. Let A C Cn be a set such that 



k=0 

for some integer r. Then, for any non-negative integer t 

r+t 



\{xeCn-.Aisi{x,A)<t]\>Y,[A- 

I-— n \ / 



fc=0 
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We are going to obtain an estimate of the cardinality of a set ^ C in terms 
of the average Hamming distance A{A) from a point x e C„ to A. It is convenient 
to express the estimate in terms of a related quantity 

As follows from Example 3.2, for every non-empty set ^ C we have < p{A) < 
1/2. We observe that p{A) = if and only if A consists of a single point and that 
p{A) = 1/2 if and only if A is the whole cube C^. 

(3.9) Theorem. Let A C Cn be a non-empty set. Let 

^2 n 

Then 



< !^ < H(p). 



n 

Before we proceed with a formal proof, we would like to highlight some ideas. 

(3.10) The idea of the proof. Extremal sets. Let ^ C C„ be a set. Con- 
centration inequalities (Lemma 3.4) imply that the average distance A(^) is ap- 
proximately equal to the distance dist(a;. A) from a "typical" point x E Cn to A. 
For a given positive integer t, let us consider the t-neighborhood At = {^x E Cn '■ 
dist(a;, A) < t} of A. We expect that A{A) ~ ti, where ti is the smallest value of t 
such that At covers "almost all" cube Cn- The neighborhood At grows the slowest 
when A is a ball in the Hamming metric, that is when A = : dist(a;, a^o) < 
for some xq G Cn and some r > 0, as follows from Harper's Theorem 3.8, cf. also 
[Leader 91]. Hence the upper bound for n"^ log2 \A\ in Theorem 3.9 is attained 
(up to an 0(n~^/^) error term) when A is a ball. The neighborhood At grows the 
fastest when the points of A are spread around in Cn- In any case, the size \At\ 
does not exceed the sum of sizes of the balls of radius t centered at the points of A. 
Thus the lower bound for log2 \A\ in Theorem 3.9 is obtained from this "pack- 
ing" type argument. One can show that if the points of A are chosen at random in 
Cn, then with high probability the lower bound is indeed attained asymptotically. 
More precisely, let us fix a number < (3 < 1 and let A be the set of |_2^"^J points 
chosen at random from Cn- Then with the probability that tends to 1 as n grows to 
infinity, /3 = 1 — (| — p) + 0{n~^/'^). The proof is straightforward, but technical 
and therefore omitted. 

Finally, we note that using average distance A (A) and the scaling trick (Lemma 
3.5) allows us to get rid of 0(n~^/^) error terms in the proof. 

Proof of Theorem 3.9. Let us choose a positive even integer m, let = mn and 
let us identify Cn — {Cn)^, as in Lemma 3.5. Let B — A^ C Cn- Let us fix the 
uniform probability measure P on Cjv. 
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Let a = log2 \A\/n, so |^| = 2"^^ and \B\ = 2°^. Let < 7 < 1/2 be a number 
such that H{-f) = a and let r = [N^\. Then by (3.7.1) 



|B| = 2-''W > t (^) 



k=0 

Then Theorem 3.8 imphes that 

N/2 



\{xeCN :dist{x,B)<N/2-r} \ > V f ^) = 2^^^ 



Therefore, 



e Cjv :dist(a;,S) < ^ - r} > ^. 



We have that x = {xi, . . . , Xm) for some a^i e Cn and that dist(a;, B) = dist(a;i, ^) + 
. . . + dist(a;rrn^) (see Lemma 3.5). Therefore, 

(1) P if dtet(..,^)<|^-^}>i. 

i=l 

Now we observe that 

, . N r n 

(2) > n7 as m — > +00. 

^ ^ 2m m 2 ' 

Furthermore, by the Law of Large Numbers, 

(3) — y^dist(a;i,^) — A(^) in probabihty as m — +00. 
m ^-^ 

1=1 

Hence the assumptfon that A(^) > n/2 — n'y would contradict (l)-(3). Thus we 
must have A(^) < n/2— 77,7, which implies that 7 < p{A). Hence a = H{'y) < H{p) 
and the upper bound is proven. 

Let us prove the lower bound. We observe that for every point h E and any 
N/2 > s > 

|{a; G Cn : dist(x,6) <s}\^Y. (^) ^ 2^-^(^/^). 

fe=o ^ ^ 

Therefore, 

|{a; e Cjv : dist(x,5) < s}\ < |S|2^-^(*/^) = 2^-(^(-/^)+«). 
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Hence 

P {x e Cat : dist(x,S) < s} < 2^-(^Wa^)+«-i). 

Therefore, 

^ m 

(4) P : -ydist(xi,A) <s/m| < 2^-(-^(*/^)+«-i). 

i—l 

If A{A) — n/2 then A is a point and the lower bound in Theorem 3.9 is satisfied. 
Otherwise, let us fix an e > such that (1 + e)A{A)/n < 1/2 and let 

s = \m{l + e) A(A)] . We have 

(5) s/m — >{l + e)A{A) and s/N — > {l + e)A{A)/n as m — > +oo. 



Hence the assumption that H (^{l + e)A{A)/rij + a — 1 < would contradict (3)-(5). 
Therefore, H{{l + e)A{A)/n)+a-l > for any e > and iy(A(^)/n)+a-l > 0. 
Since A{A)/n = 0.5 — p, the proof follows. □ 

For applications, the most interesting case is when log2 |^| is small, that is 
p fti 0. 

(3.11) Corollary. There exist positive constants ci and C2 such that for any non- 
empty set A d Cn and for p = ^ — ^ we have 

2 n 



2 InlAI , 1 

ci • p < < C2 • p m - , 

n p 



In particular, for any ci < 2 and any C2 > 1, the inequality holds in a sufficiently 
small neighborhood of p = 0. 

Proof. Follows from Theorem 3.9 by (3.7.2). □ 

(3.12) Discussion. Figure 1 depicts the feasible region for n~-^ log2 |^| as de- 
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scribed by Theorem 3.9. 
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Thus possible values of log2 \A\ with the given value of p form a vertical interval 
between the two curves. As we discussed in Section 3.10, asymptotically both 
bounds are sharp. Remarkably, the bounds converge at p = and p = 0.5. On 
the other hand, the difference is the greatest when p = 1/4. Thus, if the average 
Hamming distance from a point x e to a set A C Cn is n/4, the set A can 
contain as many as 2°'^^^'^ points and as few as 2'^'^^^'^ points. 

Corollary 3.11 (with somewhat weaker constants and stated in different terms) 
together with the observation that the distance dist(a;,A) for a randomly chosen 
point X E Cn allows one to estimate p up to an 0(n~^/^) error constitute the main 
result of the earlier paper [Barvinok 97a]. Consequently, the main conclusion of 
[Barvinok 97a] is equivalent to stating that the Hamming distance to A from a ran- 
dom point X in the Boolean cube allows one to decide whether \A\ is exponentially 
large in n. Theorems 3.6 and 3.9 make improvements of two kinds. First, we obtain 
sharp bounds valid for all < p < 1/2, and second, by averaging several random 
distances (see Algorithm 3.3 and Theorem 3.6) we get rid of the 0(n~^/^) error 
term. This allows us to obtain meaningful cardinality estimates for really small sets. 
For example, if A C is a set such that log2 |^| ~ n~", for some < a < 1, 
by applying Algorithm 3.3 to approximate A(^) and Theorem 3.9 to interpret the 
results, the worst lower bound we can get for n~-^ log2 \A\ is ~ n~'^°'\n~'^n (this 
happens when A is a ball in the Hamming metric, but we think it is a "random 
set", see Section 3.10) and the worst upper bound is ~ ?i~"/^lnn (this happens 
when A is a "random set" but we think that it is a ball). Curiously, we can even 
distinguish in polynomial time between a set consisting of a single point (p = 0) 
and a set having more than one point (one can show that p > c/n for some c > 
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in that case), although apparently we can't distinguish between sets consisting of 
2 and 3 points respectively. 

As wc remarked earlier, in applications the value of log2 \A\ is usually small 
(cf. Examples 1.2 and 2.5). Therefore, it is of interest to tighten the bounds for 
such sets. In the next section, we show that this is indeed possible: we demonstrate 
how to modify the definition of p, so that it remains efficiently computable and so 
that 

C3 • p In - < < C4 • p In - 

p n p 

for some 03,04 > 0, which improves the inequality of Corollary 3.11 in the neigh- 
borhood of p = 0. 

4. Randomized Hamming Distance 

Let us fix a number < p < 1 and let g = 1 — p. In this section, we construct a 
quantity A(A,p), which measures the cardinality of "small" subsets A C C*„ of the 
Boolean cube in a somewhat more precise way than the average Hamming Distance 
A(A) discussed in Section 3. In fact, A(A, 1) = A(A), so A(A) is a particular case 
of A(Ap). 

(4.1) Definitions. Let A„ be a copy of the Boolean cube {0, 1}". We make A„ a 
probability space by letting 

p {Z} =pWg"-l'l, where |Z| = Ai + . . . + A„ for Z = (Ai, . . . , A J. 

Hence a vector I = (Ai, . . . , A^) from A^ is interpreted as a realization of n inde- 
pendent random variables A^ such that P {Aj = 1} = p and P {Aj = 0} = q. 

For x,y e Cn and an I e A^, where x = (^i, . . . ,^n), U = {vii ■ ■ ■ ,Vn) and 
I = (Ai, . . . , An), let 

di{x,y)^ Aj. 

In other words, we count disagreement in the i-th coordinate of x and y if and only 
if the value of Aj is 1. Thus if Z = (1, . . . , 1), we have di{x, y) = dist(a;, y), the usual 
Hamming distance. 

For I e An and a set ^ C Cn, let 

di{x,A) = mindi{x,y). 
y€A 



Finally, let 



r,\l\Q"--W 



In other words, A{A,p) is the expected value of di{x, A), where x — (^1, . . . , ^n) and 
I = (Ai, ... , An) are vectors of independent random variables such that P {Aj = 
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1} = P {A, = 0} = g and P {Ci = 0} = P {Ci = 1} = 1/2. Obviously, 
A{A,p) < A{B,p) HBgA. 

It follows that for a fixed non-empty A <Z Cn, the value A{A,p) is a polynomial 
in p of degree at most n. 

(4.2) Example: Set consisting of a single point. Suppose that the set A con- 
sists of a single point. Without loss of generality we assume that A — {(0, . . . , 0)}. 
Then for x = (^i, ... ,Cn) and / = (Ai, . . . , A„), 

n 

di{x,A) = ^ Ai^i. 

Interpreting A^ and i = 1, ... ,?i as independent random variables such that 
P {Ci = 1} = P {^i = 0} = 1/2 and P {Xi = 1} = p, P {A^ = 0} = g, we get 

n n 

A{A,p) = E J2X,^, = ^(E A,)(E = f . 

i=i i=i 



It follows then that for any non-empty set A C Cn we have A(A,p) < ?ip/2 
and that A{A,p) = np/2 if and only if A consists of a single point (we agreed that 
p > 0). 

As was the case with A(A), the functional A{A,p) can be easily computed by 
averaging. For a set ^4 C Cn defined by its Distance Oracle 2.2 and any / = 
(Ai, . . . , A^) the value of di{x, A) is computed by choosing the penalties di{0, 1) = 
di{l, 0) = 1 when Aj = 1 and di — when Aj = 0. 

(4.3) Algorithm for Computing A{A,p) 

Input: A set ^ C Cn given by its Distance Oracle 2.2, a number 1 > p > and 
an e > 0. 

Output: A number a approximating A{A,p) within error e. 

Algorithm: Let k = [64n/e^]. Sample k points xi, . . . ,Xk £ Cn independently 
at random from the uniform distribution in Cn and k points h, . . . Jk G inde- 
pendently at random from the distribution in A^. Apply Distance Oracle 2.2 to 

1 ^ 

compute dl^{xi,A), i = 1, . . . ,k. Compute = t dist/. (a^j, ^). Output a. 

i=l 

(4.4) Theorem. With probability at least 0.9, the output a of Algorithm 4-3 sat- 
isfies the inequality \ A{A,p) — a\ < e. 

We postpone the proof till Section 6. 
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We are going to obtain estimates of the cardinality |^| of a set ^ C C„ in terms 
of the quantity A{A,p). As in Section 3, it is convenient to work with a related 
quantity 

p A{A,p) 
P = p(Ap)=2-^^- 

From Definitions 4.1, for any non-empty A C Cn, the function p{A,p) is a poly- 
nomial in p of degree at most n. As follows from Example 4.2, < p < p/2 for 
any non-empty set A C Cn- Our estimate will be useful for "small" sets A where 
In \A\ is close to 0. 

(4.5) Theorem. Let A C Cn be a non-empty set. Let 



2 n 



Then 



(4.5.1) ^^<'"l'*l 



Suppose that p < 1/4 and that 
(4.5.2) p > 



p n 



ln2 + ln(l -2p) 



ln(l - 2p) - ln(2p) " 
Then 

InlAI ,1 
(4.5.3) <2pln— + (l-2p)ln■ 



n ~ 2p ' l-2p 

As we remarked earlier, the case interesting for applications is when \A\ is small, 
meaning that \n\A\ ^ 0. 

(4.6) Corollary. Let us choose any cs < l/(ln2) ^ 1.44 and any C4 > 2. Then 
there exists a S > such that for any non-empty A G Cn with n"-*^ In l^l < d there 

p A(A p) 

exists a < p < 1 such that for p = ^ — one has 

2 n 

C3 • p In - < < C4 • p In - . 

p n p 



Proof. By (4.5.1), p < \/n~^ In l^l < so p(74,p) is small if 5 is small, no matter 
what p is. We observe that for small positive p the right hand side of (4.5.2) is of the 
order (ln2) ln~"'^(l/p) and the right hand side of (4.5.3) is of the order 2pln(l/p). 
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Given C3 < (ln2)~-'^ and C4 > 2, let us choose 1/16 > 5 > in such a way that 
the right hand side of (4.5.2) does not exceed (03)"^ ln~^(l/p) and the right hand 
side of (4.5.3) does not exceed C4pln{l/p) for all < p < ^/5 < 1/4. 

We recall that 1^41 = 1 if and only if p = 0, in which case the bounds of Corollary 
4.6 are satisfied by default. Given a set ^ C C„, \A\ > 1, let us choose the smallest 
p > that satisfies the inequality (4.5.2). Then < p < 1 since the right hand 
side of (4.5.2) is bounded below from as a function of p and n and smaller than 
1 for < p < 1/4. Since p{A,p) depends continuously on p, we must have equality 
in (4.5.2) (otherwise, we could have taken a smaller p). Thus p < (c3)~^ln~ (1/p) 
and the proof follows by (4.5.1)-(4.5.3). □ 

(4.7) Extremal sets. Let us fix a < p < 1 and an e > 0. Then there exists 
an q; = Q;(p, e) > with the following property: if ^ C is a set of [2°''^\ points 
randomly chosen from the Boolean cube, then with the probability that tends to 
1 as n grows to infinity, In |A| < (2 + e)p^/p. Hence for any p > the bound 
(4.5.1) is tight up to a constant factor for sufficiently small random sets. The proof 
is rather technical and therefore omitted. 

One can show that as long as p satisfies (4.5.2), the bound (4.5.3) is asymptot- 
ically attained on small faces of the cube Cn- Let us fix a 5 > (to be adjusted 
later), let m = [5n\ and let ^ C be an m-dimensional face of the Boolean cube: 

^ = • • • >^n) : = for i = m + 1, . . . ,n|. 

Thus \A\ — 2"^. Moreover, a computation similar to that of Example 4.2 shows 
that p{A,p) =pm/2n. Hence we have 

ln\A\ 21n2 , , , 
n p 

We observe that p{A,p) < 5/2. Hence for any small e > one can find 5 = 5{e) > 
such that there exists p satisfying (4.5.2) and such that p < {1 + e)(ln2) ln~"'^(l/p). 
For such a p, we have 

ln\A\ 2 , 1 

> -—pin-, 

n 1 + e p 

so the bound (4.5.3) is indeed asymptotically tight for small sets. 

Apparently, the sets A having the largest cardinality among all sets with the 
given value of p{A,p) evolve from the balls in the Hamming metric for p = 1 (see 
Section 3.10) to faces at p — > 0. Since faces are packed somewhat less tightly than 
balls, we gain in Corollary 4.6 as compared to Corollary 3.11. 

The proof of Theorem 4.5 is postponed till Section 6. 

Corollary 4.6 implies for small sets A by "tuning up" p we can get an additional 
logarithmic factor which brings the lower bound for n~^ln|A| a little closer to 
the upper bound compared to the bound of Corollary 3.11. Any p which is only 
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slightly bigger than the bound (4.5.2) will do. For example, if A C C„ is a set 
such that In \ A\ ~ n~°' for some < a < 1, it follows by (4.5.1) that p{A,p) = 
C>(n-"/2) for any p. Then we can choose some p = 0(ln n) that satisfies (4.5.2) ( 
a particular suitable value ofp can be found, for example, by dichotomy). Applying 
Algorithm 4.3 to approximate A(A,p) and Theorem 4.5 to interpret the results, for 
n~^ln|^| we would obtain a lower bound of the form ~ n^^'^/lnn at worst and 
an upper bound of the form ~ n~"/^\/lnn at worst, which is somewhat better 
than the bounds that could possibly be obtained by using the standard Hamming 
distance, see Section 3.12. 

We are not going to use A{A,p) in what follows, but we find it interesting that 
some improvement in the cardinality estimate can be achieved by simply ignoring 
a (random) part of the information contained in the standard Hamming distance. 

5. Application: Approximating the Permanent of a 0-1 Matrix 

Let A = (oy) be an n X n matrix. The permanent of A is defined by the 
expression 

n 

per A = ^ JJoia(i), 

where Sn is the symmetric group of all substitutions of the set {1,... ,n}. If 
ttij G {0, 1} for all i and j then per A counts perfect matchings in a bipartite graph 
Ga = (V, E), constructed as follows. Let V = V'^\JV~ be the set of vertices, where 
— {1+, . . . , n+j and V~ — {1~, ... , n~}, and let e = (i"*", j~) be an edge of 
Ga if and only if a^j = 1. Then per A is equal to the number of perfect matchings 
in Ga, cf. Example 1.2. The problem of computing per A is # P-hard [Valiant 79] 
and polynomial time algorithms for computing per A exactly are known only in few 
particular cases. For example, if the graph Ga is planar (see [Lovasz and Plummer 
86] ) , or more generally, has the genus bounded by some absolute constant [Gallucio 
and Loebl 99] then per A can be computed in polynomial time. If the permanent 
of a 0-1 matrix is small (bounded by a polynomial in the size n of the matrix), it 
can be computed in polynomial time, see Section 7.3 of [Mine 78] and [Grigoriev 
and Karpinski 87] . Finally, the permanent of matrices (real or complex) of a small 
(fixed) rank is computable in polynomial time [Barvinok 96]. 

Since the exact computation is difficult, the next goal is to find a "very good" 
approximation algorithm. A fully polynomial time (randomized) approximation 
scheme is a (probabilistic) algorithm that for any given e > approximates the 
desired quantity within relative error e in time polynomial in e~^. Probabilistic 
methods based on rapidly mixing Markov chains resulted in finding such approx- 
imation schemes for permanents of dense 0-1 matrices (that is, the matrices with 
at least n/2 I's in every row and column), random matrices and some special 0-1 
matrices (see [Jerrum and Sinclair 89] and [Jerrum and Sinclair 97]). However, for 
the class of all 0-1 matrices no fully polynomial time randomized approximation 
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scheme is known (but there is a "mildly exponential" approximation scheme, see 
[Jerrum and Vazirani 96]). 

In [Barvinok 97b], [Barvinok 99] and [Linial et al. 20+] a more modest goal 
was posed and achieved. Given an arbitrary non-negative n x n matrix A, the 
polynomial time algorithms [Barvinok 97b] and [Barvinok 99] (randomized) and 
[Linial et al. 20+] (deterministic) produce a number a such that 

(5.1) c*^ per ^ < a < per A 

for some absolute constant c > 0. Currently the best values of c are c ~ 0.76 
for the randomized algorithm of [Barvinok 99] and c ~ 0.37 for the deterministic 
algorithm of [Linial et al. 20+] . We also note that any polynomial time algorithm 
achieving a subexponential approximation error can be "upgraded" to a polynomial 
time approximation scheme, see [Barvinok 99]. 

Let A be an n X n matrix of O's and I's. If per A is "big" (for example, if per A 
is of the order n!/2"^, which is the average value of the permanent for all n x n 
0-1 matrices), the additional factor of in (5.1) should not be considered as a 
heavy liability. But if per A is "small" (for example, if per A is of the order 2°-°^"'), 
the lower bound in (5.1) is useless and the a produced by the algorithms may 
well be less than 1. The method developed in this section is designed to provide 
a partial remedy in this situation of a small permanent. Our approach should be 
considered within the growing family of algorithms that provide a crude yet fast 
and universally applicable estimates. 

Our algorithm for estimating the permanent of a 0-1 matrix A consists of con- 
structing a graph Ga as above, finding an economical embedding of the set of 
perfect matchings in Ga into a Boolean cube (Section 2.4) and estimating the car- 
dinality \J^\ using Algorithm 3.3 and Theorem 3.9. We present a summary below. 

(5.2) Algorithm for approximating the permanent. 

Given an n X n 0-1 matrix A = (ojj), let G = {V, E) be the graph with the set 
of vertices V = V'^ U V~, where V'^ = {1+, . . . , n^} and V~ = {1~, . . . , n~} and 
the set of edges E = {(^"'~, j~) : a^j = l}. Let sf be the degree of (the i-th 
row sum of A) and let sj be the degree of j~ (the j-th column sum of A). Let us 
compute 

n n 

m+ = Y^ [logs 4] and m" = ^ [logs s~] 
i=i j=i 

and let m = min(m"'", m~). 

Following Section 2.5, we construct an embedding of the set of the perfect match- 
ings in G in {0, 1}"^. 

Without loss of generality we assume that m = (otherwise we switch 
and V~ , which corresponds to transposing A). Let = [logs so m — mi + 
. . . + rUn. 
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To every edge e = of G incident with z^, let us assign a binary string 

(f}i{e) of length nii so that 0i(ei) 7^ 0i(e2) for every pair of distinct edges ei and 62 
with the same endvertex z+. 

Given a precision e > 0, let us generate k — [48/me^] random binary strings 
xi, . . . ,Xk, each of length m. 

For each a; = Xj, z = 1, . . . , A;, let us do the following procedure: 

Consider a; as a string of n substrings, x = yi . . .yn, where yi is a binary string of 
length rrii. To every edge e = of G assign weight 7e = dist((/)i(e), yi), where 

dist is the Hamming distance between binary strings. Find the minimum weight 
a = a{x) of a perfect matching in G using the Assignment Problem algorithm, see 
Section 11.2 of [Papadimitriou and Steiglitz 98]. 

Compute the average 



Compute 
Output p. 

(5.3) Theorem. Let A be an n x n 0-1 matrix such that per^ > 0. 

Let sf, . . . be the row and let , . . . ,s~ be the column sums of A. Let 

{n n X 

i=l i=l ^ 

With probability at least 0.9, the output (3 of Algorithm 5.2 satisfies 
where < p < 1/2 is a number such that 

1 1 

and H(x) = a; logo — \- (1 — x) logo is the entropy function. To find (3, Algo- 

X 1 — X 

rithm 5.3 solves k = [48/me^] Assignment Problems of size n x n. 

Proof. Let be the set of perfect matchings in the graph G = Ga- The proof 
follows by the "economical embedding" construction of Section 2.5, Algorithm 3.3, 
Theorem 3.6 and Theorem 3.9. □ 

The estimate of Theorem 5.3, however crude, allows us, for example, to decide 
in polynomial time whether the permanent of a given n x n 0-1 matrix is subexpo- 
nential in n. The precise statement is as follows. 
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(5.4) Corollary. Let us fix an < a < 1 and let us choose any /?>(! + a)/2. 
Suppose that A is n x n 0-1 matrix such that per A < 2"" . Let us apply Algorithm 
5.2 with e — 1/m. Then, for all sufficiently large n, the estimates of Theorem 5.3 
allow us to conclude that per A<2^ . 

Proof. We observe that m < ?i(log2n + 1). By (3.7.2), cf. also Corollary 3.11, we 
conclude that p = 0(n"/^m~^/^) and the proof follows by Theorem 5.3 and (3.7.2). 

□ 

a. 

Similarly, one can show that if per A > 2" then for any /3 < 2q; — 1, Algorithm 
5.2 with e = 1/m would allow us to conclude that per A > 2" for all sufficiently 
large n. The estimate is, of course, void for (3 < 1/2, but it is getting better as (3 

95 

approaches 1. For example, if per^ has the order of 2" , Algorithm 5.2 would 
allow us to conclude that per A is greater than 2" and is smaller than 2^ 

Corollary 5.4 demonstrates something that none of the exponential error algo- 
rithms (cf. (5.1)) can possibly do (neither can any other polynomial time algorithm 
known to the authors). On the other hand, algorithms of [Barvinok 97b], [Barvinok 
99] and [Linial et al. 20+] are better than Algorithm 5.2 for matrices with large 
permanent. Another interesting feature of Algorithm 5.2 is that it clearly favors 
sparse matrices, as the value of m (the dimension of the cubical embedding, see Ex- 
ample 2.5) for such matrices is smaller. Algorithms from [Barvinok 97b], [Barvinok 
99] and [Linial et al. 20+] seem to be completely indifferent to sparseness and even 
show some inclination to like dense matrices better. Thus, in the case of m = 0{n) 
Algorithm 5.2 beats the said algorithms on a wider range of permanents (for ex- 
ample, of the order 2*^ *^^"^). The final remark is about practical implementation of 
Algorithm 5.2. If per A is expected to be large enough (say, of the order 2""^ for 
some positive a), it suffices to choose e = 7m~^/^, for example. Thus, Algorithm 
5.2 boils down to solving one Assignment Problem. The algorithm should be able to 
handle reasonably sparse matrices with the size n of the order of several hundreds. 

Our method applies just as well to counting perfect matchings in non-bipartite 
graphs, which is a more general problem. We discussed the bipartite case in detail 
because of its connection with the permanent, a problem with rich history and 
plenty results available for comparison. 

6. Proofs of Theorems 4.4 and 4.5 

(6.1) Definition. We recall that Cn is the Boolean cube {0, 1}"^ endowed with 
the uniform probability measure and that A at is the Boolean cube {0, 1}^ endowed 
with the probability measure of Definition 4.1. Let Qn = Cn x -^n- We consider 
the product measure on Qn, so 

P {(a;,0} = p'^'9""'''2-^, where |Z| = Ai + . . . + Ajv for Z = (Ai, . . . , Ajv). 

Hence a point {x,l) G Oat is interpreted as a vector of 2n independent random 
variables (Ci, . . . , ^n; Ai, . . . , An), where P {^^ = 0} = P {Ci = 1} = 1/2, P {A^ = 
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1} = p and P {Aj = 0} = q. We observe that 
(6.1.1) A{A,p) = Edi{x,A). 

First, we need a version of the concentration inequahty (3.4). 
(6.2) Lemma. Let A C Cn be a set. Then for every 5 >0 

P |(a;, l)eQN: \di{x, A) - A{A,p)\ >S + 4v^} < Ae'^^/^. 



Proof. Given an A C Cat, let / : Qn — ^ K be defined by f{x,l) — di{x,A). Let 
M be the median of /, that is, a number such that 

P {(a;,0 e ^^JV : /(a;,0 < ^} > 1/2 and P {(x,l) E Qn ■■ fix,l) > M} > 1/2. 

Since / is a function with Lipschitz constant 1, it follows by inequality (2.1.3) of 
[Talagrand 95] that 

P {{x, e Hjv : \ f{x, l)-M\>d} < 46-^'/^ 

for any 5 >0. 

Since / is integer- valued, we can choose M to be integer. Then 

+ 00 



E |/(x, l)-M\ = J2kF {(^, : - ^1 = k} 

+ 00 +00 

= 5^ P { (x, /) : \f{x, 1)-M\>k}<4j2 e-''/^ 

k=l fc=l 

/• + 00 ^ 

< 4 / e"^ /^dx = 2^7^ < 4y/N. 
Jo 

Since by (6.1.1) we have A{A,p) = E /, we conclude that \A{A,p) - M\ < 4\/iV. 
Therefore, 

P {{x,l) : \di{x,A)-A{A,p)\ >S + 4Vn}<P {{x,l) : \di{x,A)-M\ > 6} 

< 4ex.p{-S^/N}. 



□ 



Next, we need an analogue of the scaling trick (3.5). 
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(6.3) Lemma. Let us fix positive integers k and n and let N — kn. Let us identify 
Cn = (Cn)^ , = i-^n)^ dnd = (^n)^- Thus a point {x,l) e Qn is identified 
with X = (xi, . . . , Xk'i /i, • • • , h), where Xi G C„ and /j e A„. 
For a subset A C Cn, let B = d Cn- Then 

k 

di{x,B) = J2dhixi,A) and A{B,p) = kA{A). 

i=l 

Proof. Clearly, 

k 

di{x, y) = ^ di^ {xi, yi) for all x,y gCn 
1=1 

and the first identity follows. Now, by (6.1.1) 

k 

A{B,p) = E di{x,B) = di,{xi,A) = kA{A,p). 

□ 

Now we are ready to prove Theorem 4.4. 

Proof of Theorem 4-4- Let N — nk and let us identify Cn = {CnY, = i^n)'^ 
and riiv = i^n)'^- Let B = A'^ C Cn as in Lemma 6.3. Applying Lemma 6.2, we 
get 

P e fijv : \di{x,B) - A{B,p)\ > 5 + 4\/iv} < 46"'^'/^ 

for any 5 > 0. Using Lemma 6.3, we conclude: 

1 

P [{x,l)enN : |-^d«,(x,,A)- a(Ap) 

Let us choose 5 = ek/2. Hence 

1 

P [{x,l)enN : |- J]4(a;„A)- A(A,p) 

i=l 

Since /c > 64n/e^, the proof follows. □ 

Next, we need a (crude) version of inequality (3.7.1). 
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> 



S/k + 4Vn7^| < 4e" 



■5'^/N 



> 



e/2 + 4VnA| < 4e' 



(6.4) Lemma. Let e > 0, let r(e) = pN{l — e)/2. Let y e Cjv be a point. Then 

P {{x, e Qat : diix,y) < r(e)} < g-^'^'^/^ 



Proof. Without loss of generality we may assume that y = 0. Then 

N 

P {(x,Oe^^iv:rfKa^,0)<r(e)} = P \^{x,l) e Vt^ : J2^iXi <_ 



where a; = (^i, . . . , ^at) and / = (Ai, . . . , Ajv). Let = CAi- Then (j, i = 1, . . . , TV 
are independent random variables such that P {Q = 1} = p/2 and P {Q = 0} = 
1 — p/2. Hence 

P {(x, e Qat : di{x,y) < r(e)} = P {Ci + • • • + Civ < r(e)} < e'^'^'^/^ 

by a corollary of Chernoff's inequality (see [McDiarmid 89]). □ 

Now we are ready to prove the first part of Theorem 4.5. 

Proof of inequality (4-5.1). Let us choose a positive integer m, let TV = mn, let 
Cn = (Cn)"", and let Aa^ = (A„)"^. Let S = A"^ C Cat as in Lemma 6.3. 
Let us choose an a > 0. Applying Lemma 6.4, we obtain 

P |(a;,0 e : di{x,B) < pN{l - V^)/2^ < \B\e-"P^/^ = (|A|e-"P'*/^)"'. 

Therefore, by Lemma 6.3 

P [{x,l) e Qn : -J^dhi^i^A) < pn{l - V^)/2} < (^e-"^-/")"^. 

The right hand side of the inequality tends to provided a > 41n \ A\/pn. Since by 
the Law of Large Numbers 



^ m 

— di.{xi, A) — > A(A,p) in probability as m — +oo, 
1=1 

we must have 



A(^,p) > - Va)/2 for any a > 41n |^|/pn. 

Hence 

A(^,p) >pn(l - v^)/2 for a = 41n|^|/pn, 
which is equivalent to (4.5.1). □ 

In Section 3, we used the sharp isoperimetric inequality (Theorem 3.8) for the 
Hamming distance in Cn to get a sharp upper bound for log2 \ A\. Unfortu- 
nately, we don't know of a similar result for the randomized Hamming distance. 
To prove (4.5.2)-(4.5.3), we proceed by induction on n in a way resembling that of 
[Talagrand 95] (see also Remark 6.9). 

We start with a simple technical result. 
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(6.5) Lemma. For any < e < 1, any 7 > and any < p < 1 and q = 1 — p we 

have 

min|— +ln , pin \-qln 1 < maxlo, ln(l + e'^^^) — — —ln2\. 

l2 1 + e 1-e 1 + e) ~ I' ^ ^2 J 



Proof. Fixing p, q and 7, let 



P7 1 11 
/(e) = h In and (7(e) = pin hgln 



2 1 + e ' ' 1-e 1 + e 

Then /(O) > and /(e) is decreasing whereas ^(e) behaves as follows: ^(0) = and 
if p > q then g{e) is increasing and if p < q then (7(e) is decreasing for < e < g — p 
and increasing for q — p < e < 1. Furthermore, /(cq) = (/(eo) at the single point 
eo = (e^/2 - 1)/(1 + eT/2), where /(eo) = ^(eo) = ln(l + e^/^j _ ^^/2 - In 2. The 
proof now follows. □ 

(6.6) Definition. Let Hn (or simply /x) denote the uniform probability measure 
in Cn- Hence //(A) = |A|/2". 

The induction is based on the following lemma. 

(6.7) Lemma. Let A C Cn+i be a set. Let 

Aq = [x e Cn : {x, 0) e A] and Ai ={x e Cn : {x, 1) e ^}. 

For I e An let {I, 0) G A^_|_i denote I appended by Xn+i — and let (/, 1) G Aji_|_i 
denote I appended by Xn+i = 1- -^'C^ 

Ao(Ap) = E (i(i,o)(a;,^) and Ai(^,p) = E d(;,i)(a;,yl), 

where the expectation is taken with respect to a random {x,l) G Cn+i x A^,,. Then 

(6.7.1) ^^^^^^^^ = .n^.iA); 

(6.7.2) A(Ap) = gAo(Ap) +pAi(A,p); 

(6.7.3) Ao(Ap)< A(Ai,p) /or i = 0,l; 



(6.7.4) Ai(Ap)< A(^i,p) + ^ /or z = 0,l; 



(6.7.5) A,(Ap)<^^^^^^^^^^^. 

26 



Proof. Clearly, |^o| + |^i| = 1^1, so (6.7.1) follows. Identity (6.7.2) is immediate 
from Definitions 4.1. We observe that for any a;, j/ e C^, 

d(i,o){{x,j),{y,i))^di{x,y), where i,je{0,l}. 

Hence 

d(i,o){ixJ),A) < di(x,Ai), ij = 0,1 

and (6.7.3) follows by averaging. 
Next, we observe that 

d(i,i){{x,i), {y,j)) = 

Therefore, 



di{x,y) 

di{x,y) + l iiij^j. 



c?(z,i)((a;, 1), A) = mm{di{x,Ai),di{x,Ao) + l} 

and 

d(Li){{x,0),A) = mm{di{x,Ao),di{x,Ai) + l}. 
Averaging over (x, I) e C„+i x A„, we get 

^ Ed;(a;,^i) + Ed;(a;,^i) + 1 1 
^ 2 ^ ^(^i)Pj + 2" 

Similarly, 

^ E d;(a:,Ao) + l + E d;(a;,ylo) 1 
^ ^ = ^1^0, + -, 

which completes the proof of (6.7.4). Finally, 

A,(^,p) = E </,„.(^^^) = ^^(U)((^.').^)+E-^(M)((^.°).^) 

^ E(i;(a:,Ai) + E(i;(a:,Ao) ^ A(Ai,p) + A(Ao,p) 



and (6.7.5) is proved. □ 

Now we use induction to get a preliminary bound. 
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(6.8) Lemma. Suppose that for some 7>0, 0<p<l and q = 1 — p, 

ln(l + e^/2)-^-ln2>0. 
Then for any non-empty set A G Cn we have 

-fA{A, p) + In //(A) < n (^ln(l + e'^/^) - ^ - In 2) . 

Proof. We proceed by induction on n. If n = 1 then the two cases are possible: 

A consists of a single point, iJi{A) = 1/2 and A(^,p) = p/2 (see Example 4.2); 
A = {0, 1}, iJ.{A) = 1 and A{A,p) = 0. 

In both cases the inequality holds. 

Suppose that the inequality holds for non-empty subsets of C„. Let us prove 
that it holds for non-empty A C Cn+i- Let us define Ao,Ai C Cn as in Lemma 
6.7. Prom (6.7.1) it follows that either 

IJ,n{Ao) = (1 - e)iJn+i{A) and /Xn(Ai) = (1 e)/in+i(A) 

or 

fJ'n{Ai) = (1 - e)iJ,n+i{A) and A*n(^o) = (1 + e)iJ,n+i{A) 
for some < e < 1. 

Let B be the one of the sets Aq, Ai that has a bigger measure fi^ (either of the 
two if UniAo) = Hn{Ai)) and let D be the one of the sets Aq, Ai that has a bigger 
value of A{-,p) (either of the two if A{Aq,p) = A{Ai,p)). Then 

Atn(S) > (1 + e)//n+i(A) and iin{D) > (1 - e)nn+i{A). 

Furthermore, by (6.7.3) 

Ao{A,p) <A{B,p) and Ao{A,p) < A{D,p) 

whereas by (6.7.3) and (6.7.5) 

A^{A,p)<A{B,p) + ^ and A^{A, p) < A{D , p) . 

Hence we get 

^Ao{A,p) + In/in+iiA) <^A{B,p) + ln//n(S) + In 
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and 

{A)< 

min|7A(S,p) + In/iniB) + In + |, l^{D,p) + ln//„(D) + In y^}- 

Clearly, is non-empty. Assume first, that D is non-empty as well. Applying the 
induction hypothesis to B and D, we conclude that 

7Ao(^,p) + lnfin+i{A) < n(ln(l + e^/^) - ^ - ln2) + In ^ 

and 

7Ai(Ap)+ln//n+i(^) <n(ln(l+e^/2)-^-ln2)+min{ln^ + |, In^}. 

Adding the first inequality multiplied by q and the second inequality multiplied by 
p and using (6.7.2), we get 

jA{A,p) + \niin+iiA) < 

nfln(l + e^/2) - ^ -ln2) +min(— +ln^^, pin + g In ^^1. 

V ^ ^2 / 12 1 + e' ^ 1-e ^ l + eJ 

The desired inequality follows by Lemma 6.4. 

If D is empty then iin{B) = 2//„+i(^) and we obtain 

^Aq[A,p) + ln/x^_|_i (A) <7A(S,p)+ln/x,(S)-ln2 

and 

-fAi{A,p) + ln//^+i(A) < -fA{B,p) + In i^n{B) - ln2 + | 

Adding the first inequality multiplied by q to the second inequality multiplied by 
p and using (6.7.2) and the induction hypothesis, we get: 

jA{A,p) + ln/x,+i(A) < 7A(S,p) + ln^^{B) - ln2 + ^ 



<.(l,(l + e^/.)_^_l,2) + (2-^-ln2) 
< (n + 1) (^ln(l + eT/2) - ^ - In 2^ 



which completes the proof. □ 

Now we are ready to complete the proof of Theorem 4.5. 
Proof of (4.5.2)-(4.5.3). By Lemma 6.8, 

In^ = l"/^n(A) ^ ^ ^ ^ _ g7 _ 7A(A,p) ^ ^ _ 7 ^ 

n n 2 n 2 
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provided 

We optimize the inequahty on 7 > 0. Let 



ln(l + e^/2)-^-ln2>0. 



Since we assumed that p < 1/4, we have 7 > 0. Furthermore, 

ln(l + e^/2) - — -ln2 = ln— -glnf— -1) -ln2 

= - ln(l - 2p) + p(ln(l - 2p) - ln(2p)) - In 2 > 0, 

because of (4.5.2). Therefore, 

InlAI 1 l-2p l-2p 1 , , 1 

— ^ <ln In ^ + 2pln = 2pln — + 1 - 2p) In 

n - 2p 2p ^ 2p ^ 2p^ l-2p 

and (4.5.3) follows. □ 

(6.9) Remark. Our proof of (4.5.2)-(4.5.3) can be considered as an "additive" 
version of Talagrand's method [Talagrand 95] . Indeed, Talagrand's approach very 
roughly can be can be stated as follows. Let f2 be a space with the distance function 
d and probability measure /U. To prove an isoperimetric inequality for A C O, we 
first find a uniform bound for the expression /u"(A) ■ E exp{r(i(x, A)} and then 
adjust parameters o; > and r > 0. This way tight inequalities are obtained 
in [Talagrand 95] for sets A of large measure, most often with p(A) > 1/2. We 
are mostly interested in sets of a small measure. One can check that for "small 
sets" A the inequalities of [Talagrand 95] are very far from sharp, which is, of 
course, should not be perceived as a "fault" of the method, since the method was 
designed for totally different problems. We find a uniform bound for the expression 
Inp(A) + aE d{x, A), which looks like Talagrand' functional with "exp" removed. 
Our method seems to produce reasonably good bounds for small sets A but it fails 
miserably for large A, with jJi,{A) = 1/2, say. As should have been expected, the 
case of "middle-sized" sets is the most complicated. 

7. Concluding Remarks 

Connections to Monte-Carlo methods. The main idea of our approach can be de- 
scribed as follows: given a (finite) ambient space Q, and a set A C il, we estimate 
the cardinality \A\ by choosing a certain distance function d m. Q. and estimating 
the average distance 

^(^) ~ TTTT '^('^' where d{x^ A) — mirvd{x^y) 

30 



from X e Q to A. We get the classical Monte-Carlo method if the distance function 
d is the simplest possible: 



d{x, y) 



1 ii X ^ y 
\i X = y. 



In this case, A(A) = |A|/|0|, so there is a direct relation between A(A) and \A\. It 
is well understood that the main difficulty with the Monte-Carlo method is that if 
\A\is very small compared to |0|, it is hard to get an estimate for the cardinality of 
A different from 0. In other words, if \A\ is "exponentially small" compared to 
to get a non-trivial bound for \ A\, we have to compute A(A) with exponentially high 
precision. In this paper, we showed that in many interesting cases one can choose 
a different distance function d, so that the distance d{x. A) from a point x & VL 
to A is efficiently computable and to get a meaningful estimate of |A| even for 
exponentially small sets A, one need to compute A(yl) with a polynomial precision. 

Hence our approach can be considered as a natural extension of the Monte- 
Carlo method. In this context, economical embedding 2.4 can be considered as 
an analogue of the "importance sampling", whose objective is to replace a large 
ambient space by a smaller space containing A. 

Embedding in different metric spaces. Given a combinatorially defined family c 
2^, we constructed its embedding into the Boolean cube {0, 1}" and investigated 
what happens in the cube is endowed either with the standard Hamming distance 
(Section 3) or with its randomized version (Section 4). In many cases, there are 
different ways of metrization of JF. One example is provided by the set T of perfect 
matchings in a given bipartite graph studied in the paper. 

Let G = (F+ U V-, E) be a bipartite graph with = {1+, . . . ,n+}, V' = 
. . . ,n~} (cf. Example 1.2 and Section 5). For every vertex z+ e Vl|., let 

= {e = (^+,r):eeS} 

be the set of edges of G coming out of i'^ . Let 

Q, — Vti X . . . X Vtn- 

Every perfect matching in G can be identified with a point in f2, so the set J- of all 
perfect matchings in G is identified with a subset F C O. 

Let di be a distance function on Qj, i = 1, . . . , n. Let us define the distance 
function d on SI by 

n 

d{x,y) ^^di{xi,yi), where x = {xi, . . . ,Xn) and y = (yi, . . . , y^). 

It is easy to check that for any a; G O, a; = (xi, . . . the distance d(a;, F) is 
the minimum weight of a perfect matching in G with weighting 7(e) = (ii(e, Xi) for 
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e = Hence for any x, the value of d{x,F) can be found in 0{n^) time. 

How should we choose di to get the best possible estimates for the number \F\ of 
perfect matchings in G? 

The authors looked into some of the most obvious candidates, when is a graph 
metric on Qi for a complete graph and for a path (circle). Interestingly, choosing 
Qi isometric to a subset of a power of the complete graph with m = 0(lnn) vertices 
leads to an improvement by logarithmic factor similar to that of the randomized 
Hamming distance (Section 4). The choice of di used in this paper comes from 
identifying Qi with a subset of the Boolean cube {0,1}"^* for rrii — [log2 
see Section 5. Perhaps one should use a whole family of distance functions di and 
combine the resulting estimates. General isoperimetric inequality of [Alon et al. 
98] may be very useful for that. 

Weighted counting. Let c 2''^ be a family of subsets of the ground set X = 
{!,... , n} and let //(z) = pj/gj > be a rational weight of i e X, where Pi,qi G N. 
Let us define 

/i(r) = JJ ii{i) for Y eJ^ and ^{J^) = ^ ^{Y). 

We may be interested to estimate iJi{T)- There are several ways to extend our 
methods to problems of this type, here we sketch one. For every i e X, let rrii = 
\\og2{pi + gi)l- Let us choose subsets Ai C C^. and Bi C C^. such that \Ai\ = Pi, 
\Bi\ = Qi and ^^0 5^ = 0. Let m = mi + . . . + m„ and let us identify 

Cm — X ... X Crrin • 

For y C .F let Zy C Cm be the direct product of n factors, the i-th. factor being 
Ai if i eY and Bi if i ^Y. Finally, let F C Cm be the union of aU Zy for Y e J^. 
We see that /u(jF) = (^qi ■ ■ ■q^)~^\F\. Moreover, one can define subsets Ai and Bi 
in such a way that Optimization Oracle 1.1 for gives rise to Distance Oracle 2.2 
for F. This construction corresponds to the straightforward embedding (2.3). In 
some cases, there is a way to come up with an economical embedding in the spirit 
of (2.4). 
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